Classifying Unstructured Textual Data Using the Product Score Model: An Alternative Text Mining Algorithm
نویسندگان
چکیده
Unstructured textual data such as students’ essays and life narratives can provide helpful information in educational and psychological measurement, but often contain irregularities and ambiguities, which creates difficulties in analysis. Text mining techniques that seek to extract useful information from textual data sources through identifying interesting patterns are promising. This chapter describes the general procedures of text classification using text mining and presents an alternative machine learning algorithm for text classification, named the product score model (PSM). Using the bag-of-words representation (single words), we conducted a comparative study between PSM and two commonly used classification models, decision tree and naïve Bayes. An application of these three models is illustrated for real textual data. The results showed the PSM performed the most efficiently and stably in classifying text. Implications of these results for the PSM are further discussed and recommendations about its use are given.
منابع مشابه
Mining Interesting Aspects of a Product using Aspect-based Opinion Mining from Product Reviews (RESEARCH NOTE)
As the internet and its applications are growing, E-commerce has become one of its rapid applications. Customers of E-commerce were provided with the opportunity to express their opinion about the product on the web as a text in the form of reviews. In the previous studies, mere founding sentiment from reviews was not helpful to get the exact opinion of the review. In this paper, we have used A...
متن کاملAn Innovative Research Framework on Intelligent Text Data Classification System Using Genetic Algorithm
Recent years have witnessed an astronomical growth in the amount of textual information available both on the web and institutional wise document repositories. As a result, text mining has become extremely prevalent and processing of textual information from such repositories got the focus of the current age researchers. Indeed, in the researcher front of text analysis, there are numerous cutti...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Text Analytics to Derive Customer Service Management Benefits from Unstructured Data
The Growth of Text Analytics1 Estimates suggest that about 80% of today’s enterprise data is unstructured.2 Unlike structured data, which is tidy and mostly numeric, unstructured data is often textual and, therefore, messy. Unstructured data comprises documents, emails, instant messages or user posts and comments on social media, and presents a challenge to data miners; analyzing unstructured d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012